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ABSTRACT 

A new approach to the development of the item 
characteristic curve (ICC), which expresses the functional 
relationship between the level of performance on a given task and an 
independent variable that is relevant to the task, is presented. The 
approach focuses on knowledge states, decision processes, and other 
circumstances underlying responses to objective tests. Earlier work 
on finite state models of objective test performance provides the 
basis for deriving expressions for ICCs that directly account for 
factors such as examinee willingness to guess, mode of test 
administration, number of options per item, and response str,-\tegy of 
the examinees. This approach uses a parameterization of ability 
different from that used in conventional item response theory (IRT) 
and yields ICCs that are polynomial functions of ability. The degree 
and coefficients of these polynomials depend in part on certain 
psychological/circumstantial factors. Examples are provided to 
demonstrate the means by which differing assumptions about objective 
test response strategies lead to variation in the shapes of the 
resulting iccs. The advantages that IRT could gain from adoption of 
these ICCs are discussed, and the work that remains to be done before 
finite state polynomic ICCs can be used in practice is outlined. Some 
possible extensions to the finite stato approach c*re also discussed • 
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Abstract 



A new approach to the ('ivelopment of the item characteristic curve (ICC) is presented, in 
which knowledge states, decision processes and other circumstances underlying responding to 
objective tests receive a priori consideration. Earlier work on finite state models of objective 
test performance provides the basis for deriving expressions for ICCs that duectly account for 
factors such as examinee willingness to guess, mode of test administration, number of options 
per item, and the response strategy of the examinees. This approach utilizes a 
parameterization of ability different from that used in conventional item response theory 
(IRT) and yields ICCs that arc polynomial functions of ability. The degree and coefficients of 
these polynomials depend in part on psychological/circumstantial factors such as those just 
mentioned or others tiiat may readily be introduced. Examples are provided to show how 
differing assumptions about objective test responding lead to variation in the shapes of the 
resulting ICCs. The advantages that IRT could gain from adoption of these ICCs are 
discussed, and the work tiiat remains to be done before finite state polynomic ICCs can be 
used in practice is outiined. Some possible extensions to the finite state approach are also 
discussed. 



ERIC 



3 



Item Characteristic Curves: A New Theoretical Approach^ 



Miguel A. Garcfa-P^rez 
Universidad Complutense 
de Madrid 



Robert B. Fiary 
Virginia Polytectinic Institute 
and State University 



The item characteristic curve (ICC) is a key element of item response theory (IRT). Broadly 
speaking, an ICC expresses the functional relationship between the level of performance on a 
given task and an independent variable that is relevant to the task. As applied to ability or 
achievement testing, where IRT emerged, the ICC expresses the probability of responding 
correctly to an item as a function of the examinee's (unobservable) ability or knowledge. 

Despite being a fundamental feature of IRT models, the true functional form of this relation- 
ship must remain unknown. Nevertheless, application of IRT requires the adoption of some 
mathematical form for the ICC. In Lord and Novick (1968, Section 16.8), some justification 
is provided for the two-parameter normal ogive, with a derivation of sufficient if rather re- 
strictive conditions for data to be consistent with this model. However, the conditions derived 
are by no means necessary ones, and Lord (1980, p. 30) stated a prefermce for considering 
any particular ICC as representing a basic assumption in its own righi, which must be justi- 
fi^ empirically. Replacement of the normal ogive with the logistic function was motivated by 
its ability to mimic the normal ogive while being more tractable mathematically (see Bim- 
baum, 1968). The further development of the logistic function through the addition of the 
pseudo-chance parameter, might be said to be theory driven; it was assumed that examinees 
of very low ability would guess essentially at random on multiple-choice items, resulting in a 
lower asymptote for the ICC at the probability of a correct guess under these conditions. No 
further applications of psychological theory have yielded fundamental changes in the 
mathematical form of ICCs adopted for large-scale IRT plications. 

Hambleton and Swaminathan (1985, pp. 9-10) comment on the wide range of IRT models 
that can be operationalized by simply changing the mathematical form of the ICC but do not 
mention psychological considerations that might guide these changes. In keeping with Lord's 
philosophy, they only suggest testing the appropriateness of the choice by conducting good- 
ness-of-fit studies. Samejima (1981, p. 230) pointed out some criteria for choosing among 
various types of ICCs. However, her main conclusion was that the appropriateness of any of 
the proposed models depends largely on the guessing behavior of the examinees. McDonald 
(1982) approached this question more generally, proposing a framework from which many 
models can be generated by varying the cumulative distribution function to represent the ICC. 
He also offered no psychological criteria for deciding on the most realistic model, only point- 
ing out the need for statistical tests leading to acceptance or rejection of any particular model. 

The foregoing analysis led us to the conclusion that the logistic fiinctions (or any oUier con- 
ventional ICCs for that matter) do not embody psychological theory, in the sense that their a 



An expanded versirai of this paper has beea accepted for publication in the Briti^ Journal of Math- 
ematical and Statistical Psychology under the title, Finite State Polynomic Item Characteristic Cwves. 
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priori ^propriateness as ICCs does not follow from a formalization of the processes and vari- 
ables that aie involved in responding to test items. In fact, support for their use only comes a 
posteriori, OTicc they have been shown to describe data adequately (with the help of suitably 
estimated parameters). This pragmatic approach to justifying the choice of logistic ICCs was 
evident in Lord's (1980, p. 31) assertion that "justification of their use is to be sought in the 
results achieved, not in fimher rationalizations." 

Contrary to prescribing the form of the ICC largely on the basis of expediency, we adopt 
here an explanatory approach to the generation of mathematical expressions that can be used 
as ICCs. Adopting a diff«wit parameterization of examinee ability and item difficulty, and 
starting from (replaceable) assumptions about examinee behavior and item characteristics, 
finite state theory allows the derivation of expressions for the probability of responding cor- 
rectly to a test item. This approach produces ICCs that are primarily dependent on ability and 
difficulty, just as is the case for conventional ICCs. However, certein aspects of their mathe- 
matica] expiession also depend on other variables. Th^ variables, not incorporated into 
conventional IRT models, include the numbw of options per item, examinee willingness to 
guess when uncertain, the response strategy followed by the examinees, the format of admin- 
istration of the test, and potentially other item characteristics. This point is illustrated by 
providing ICCs for different situations. All of these ICCs are pdynomial functions of ability, 
and we refer to IRT models built around them as finite state pofynomic models. 

The goal in this paper is to introduce these new ICCs and to compare them with logistic ICCs 
from a number of points of view, with special attention to the different parameterizations 
underlying each type of ICC and their theoretical foundations. 

Finite State Theory and Fhiite State Polynomic ICCs 

The assumptions and definitions underiying finite state modelling of objective test 
performance have been thoroughly dealt with elsewhere (Garcfa-P^rez, 1987, 1989a, 1989b, 
1990; Garcfa-Pdrez & Frary, 1989). To avoid repetition, only a brief account of the theory 
will be supplied here, which will suffice for the development to follow. However, a reading 
of Garcfa-Pdrez (1987) and Garcfa-P^rez and Frary (1989) will provide a more detailed 
justification for some of the assumptions and definitions that wi^l now be introduced, 

Tlie term "statement" is central to the finite state approach. In the context of multiple-choice 
testing, a statement is any sentence resulting from adding to the item stem one of its available 
options. Finite state theory defines the level of knowledge of an examinee as the proportion 
of statements about a subject matter whose truth value he/she knows. This characterization of 
knowledge is akin to Falmagne and Doignon's (1988) definition of a knowledge state with 
respect to a body of information. In addition, the theory assumes that, when facing a multi- 
ple-choice item, the examinee makes independent attempt to classic every single available 
option as true or false. This process gives rise to a finite set of knowledge states about the 
item, ranging from total ignorance through several degrees of partial knowledge to total 
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knowledge. These states, in conjunction with the guessing strategy that the examinee adopts, 
determine whether a convoitional multiple-choice item will be answeied correctly, answered 
incorrectly, or left unanswered. 

When the process just described is translated into mathematical terms, expressions for the 
probability of these observable response outcomes can be (terived. A two-stage modelling pro- 
cess accomplishes this goal. First, an expression is adopted for the probability that the truth 
value of a statement will be known. Then, expressions are derived for the probability that any 
given response outcome will occur. These two stages will now be dealt with separately. 

Probability of Knowing the Truth Value of a Statement 

Let X (O^X^l) be the true proportion of statements about the subject matter whose truth value 
an examinee knows. This is the (uoidimoisional) lat«it variable rqnesenting the examinee's 
level of knowledge or ability, and it is also the probability that he/she will know the truth 
value of a randomly dram statement in a multiple-choice item. We call it X instead of the 
usual 6 in IRl' for the sake of ccMisistency with our previous work and to stress the fact that it 
is not in some sense interchangeable with 6. Not only do X and 6 span differoit ranges, they 
also have different relationships to performance, as will be seen. While X is /U7/ the probabili- 
ty of answering an item correctly, it is clearly related to this probability, as will be shown. 
Also influencing the probability of answering correctly is the presence of topics in the subject 
matter of interest, represented by items on the test, that are easier or more difficult than oth- 
ers. Therefore, it would seem an oversimplification to assume that an examinee has a proba- 
bility X of knowmg whether an option is true or false as applied to a given item stem, not 
taking into consideration the difficulty of the question being asked. So let 5 (0< £ < 1) repre- 
sent item difficulty with values closer to 1 the easier the item, as is the case for the classical 
dif^culty parameter. We call it 6 instead of b, as is usual in logistic ICCs, because this pa- 
rameter does not have the same meaning nor the same effect af b, as will be seen below. Tr 
take item difficulty into account, we let the probability, p, that an examinee with ability X 
knows the truth value of an option in an item of difficulty S be 

P = (1) 

Figure la shows a three-dimensional plot of this power function of the inverse of item diffi- 
culty, and Figure lb shows sections of this function for items of selected difficulties. Note 
that, for any given X, p mcreases with decreasing item difficulty (increasing 6). Note also 
that p>X when S>.5 and p<X if S<.5 while 5 =.5 makes p=A.. Our choice of the func- 
tional relationship of Equation 1 was limited to functions such that, as S increases from 0 to 
1 (though not taldng on these extreme values), p increases gradually and monotonically from 
a value of 0, attaining the value X when S-.5, and the value 1 when f »1. It is similar to 
the power functions that appear in psychophysics (Atkinson, 1982) and is consistent with 
attempts to establish links between test theory and psychophysics (Mosier, 1940, 1941; 
Hutchinson, 1977), 



« 





Figure L (a) 3-D plot of Equation 1. The origin of coordinates is at the lower left comer of 
the rhomboid over which the surface ascends. Examinee ability (X) increases from 0 to 1 
along the horizontal axis. Item difficulty (5) increases from 0 to 1 along the 45" tilted 
axis. The height of each point in the surf^ is the probability of correct classification of an 
option in an item of difficulty 6 by an examinee of ability X as given by the plane coor- 
dinates of its vertical projection. Refermce grid lines are ^>aced at intervals of .2 units. 

(b) I'D cuts of the probability surface at several item difficulty values. From top to 
bottom, the curves represent the probability of correctly classifying an option within an 
item as a function of ability for items of difficulties 6 =.9, .8, .7, .6, .5, .4, .3, .2, and .1. 
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It may be noted that Equation 1 is actually the ICC for a very simple type of item to which 
examinees re^nd under very restricted conditions, namely, true-false items at which they 
never guess. As such, this function was chosen aifoitrarily (but in conformity w:th the criteria 
just outlined). If true-false responses with omissions in the absence of knowledge could exist 
in reality, there is then the question of how well they would fit the ICC of Equacion I . It is 
possible that they would fit some other function better, for example, a bilinear fimction, such 
as those used in Frary (1985) and in Garcfa-P€rez and Frary (1989). We leave open the 
question of the appropriateness of Equation 1 but will adopt it for further developmeni in this 
paper, because it is plausible, mathematically tractable, and not at all critical to oui main 
argument. Any other function meeting the criteria outlined above could be used and would 
be preferable if it led to a better fit of real data. What Equation 1 (or a substitute for it) i^p- 
resents is a prototypical ICC. We will show how it may be used as a "building block" in the 
production of ICCs accounting for various sets of circumstances associated with multiple 
choice testing, circumstances that go beyond the simple case for which Equation 1 might be 
appropriate. 

Probability of Each Response Category to a Multiple-Choice Item 

To demonstrate that application of Equation 1 , we will assume a set of conditions associated 
with a multiple-choice test. These assuniptions were chosen to specify a rather comprehensive 
set of testing circumstances. Howevc*-, to facilitate pieliminary development, the assumptions 
are basically simple. fAs a result, some of them may not seem highly plausible, though they 
are by no means impossible.) Following development of the ICC for these preliminary as- 
sumptions, various ones will be modified in turn in the next section, and the resulting ICCs 
will be derived. The preliminary assumptions are as follows: 

i local independence across items. 

ii indq>endence of options. This means that options within an item must be independently 
classifiable by examinees as if tliey actually were independent true-false items. Thus, 
correct classification of fewer than all of the distractors must not lead the examinee to 
infer what the correct answer is if he/she does not know it. Unavoidably, of course, cor- 
rect classification of the answer must lead to classification of previously unclassified 
x}ptions as distractors, but this situation is handled appropriately by our procedure, as 
will be seen. 

iii test with three-option items. 

iv distractors that are equally attractive as the correct answer to each item. This means 
that, for a given examinee, the probability of being able to classify a randomly chosen 
correct option is the same as for a randomly chosen distractor. 

V examinee behavior such that the occurrence of (random) guessing among unclassified 
options is determined only by each examinee's individual (overall) willingness to do so 
irrespective of the number of distractors that he/she has identified on a particular item. 
Individual differences exist in willingness to guess at random, so let y (O^y^^) repre- 
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sent this willingness as the probability that a (specific) examinee will guess at random 
when the conect answer is not known, 
vi conventional administxaticm of tl^ test, i.e., asking examinees to maik the alternative 
believed to be correct for each item, but without advice as to how the test wiU be scored 
or regarding the guessing strategy required for score c^mization. (This lack of Infor- 
mation would be consistent with the guessing behavior assumed in Assumption v.) 

As a consequence of all of the above assumptions, an examinee of ability X has a probabil'!;; 
p of knowing whether each of the options of an item of difficulty S is true or fklse. Thus, 
he/she will be able to classify the answer and some number of distractors for the item depend- 
ing on A. and as is clear from Equation 1. But it may haj^ that this knowledge will be 
insufficient to permit marldng the correct answer with assurance. In this case, the examinee is 
free to guess at random among the unclassified options. 

# 

With these considerations in mind, our task is to derive a mathematical expression for the 
probability of getting the correct answer to such an item as a function of X, y, and £. Use of 
a tree diagram to describe the possible sequences of events when responding to that multiple- 
choice item facilitates this task. The tree diagram for three-option items responded to under 
die directions in Assumption vi has been pres^ted and described in detail in Garcfa-P^rez 
and Frary (1989 , Figure 1) for tiie special case in which die probability of being able to clas- 
sify each option is simply X. Figure 2 is an adaptation, in which we have replaced X witii p 
in accordance with die development above. Note diat assumptions ii-vi have been taken into 
account in constructing tiiis diagram. (Local indQ)endence across items is only needed to 
collapse data from all items in the test.) Independence of qitions is necessary in order tiiat tlie 
links of die tree diagram signifying classification of options witiiin a patii be statistically in- 
dependent. The diree options in tiie item give rise to die eight-branch structure that is repre- 
sented by die first diree links. Four possible states of knowledge regarding die item arise 
from tins branching: correct classification of all diree options (total knowledge), correct clas- 
sification of two options only (high partial knowledge), correct classification of a single 
option (low partial knowledge), and correct classification of no option (total ignorance). In 
case of total knowledge, die examinee always gives die correct answer to die item. In case of 
partial knowledge, Assumption iv leads us to apply a probability of kin diat die correct an- 
swer to an n*option item is among k {Q<k<n'\) classified options. If exacdy n-1 options are 
classified, dien the correct answer is always given since it is eidier included among die classi- 
fied options or it is the only option that is not identifled as a distractor, which means diat it is 
the answer (disregarding die possibility of misinformation). If the correct answer is not 
known in other cases of partial knowledge or in die case of total ignorance, the examinee, 
according to Assumption v, may eidier guess at random (succeeding or failing) or leave die 
item unanswered. Finally, Assumption vi results in diese states of knowledge leading to three 
possible response outcomes: correcdy answered item, wrongly answered item, or unanswered 
item, as shown to die right of each padi by C, W, and t/, respectively. Also shown to die 
right of each padi is die probability of tiiat particular sequence of events, which is die product 
of die probabilities of each link widiin diat padi. Since diere are several sequences diat result 
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in the same outcome, the sum of all probabilities of paths with the same result is the actual 
probability of that outcome. Then, we get 

c = / + 3/(1-/1) + pil'pf + pil'ph + {l-pfyfX (2a) 
w = p{l-pfy + 2(l-p)\/3, (2b) 
u = Ipd-pfa-y) 4- (1-/>)'(1-Y), (2c) 

Link # 

1 2 3 4 5 6 O utcome 

C:p\l-p) 

CipHl'P) 
C:p{Up)^/i 
C:pil-pfy/} 
W': p(l-P)*r/3 
U : 2pil-p)H\-r)/i 
CipHl-p) 
C:p{\-pfn 
C:p{\-p)^n 

V:lp{\-p)\\-r)/l 
C:p(l-pf/y 

C: p(J-p)hn 

{/: 2/Kl-p)'(l-r)/3 
C:(I-p)»Y/3 
ir: 2(l-p)^/3 
U : {l'p)\\-r) 

Figure 2. Tree diagram describing the possible sequences of events when responding to a test 
item when Assumptions i-vi in the text bold. The first three links represent attempts at 
classifying each option as right or wrong. In the paths where they ^)pear, the fourth links 
represent inclusion of the correct answer among the classified options, the fifth links repre- 
sent decisions as to guessing, and the sixth links represent the outcomes of those guesses. 
Each path results in a responre outcome that is represented to the right of the path by either 
C, W, or [/. Also shown to the right of these letters is the probability of the sequence in 
each path. 
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in which c, w, and u denote the prol)abiUties of the response outcomes designated by the cor- 
responding uppercase letters. Note that Equatitm 2a expresses the probability of responding 
correctly to an item as an explicit function of ability and item difficulty (and, also, guessing 
propensity). Therefore, like Equation 1, it is an ICC. Note also that Equation 2a embodies 
Assumptions ii-v, since they were used in the construction of the tree diagram from which 
this equation comes. These assumpiicms cover tiie test administration format (Assumption v), 
the number of options per item (Assumption iii), the gueswng behavior of examinees (As- 
sumption v), and other item characteristics (Assumptions ii, and iv). To illustrate the appear- 



Figure 3. Finite state polynomic ICC given by Equation 2a. In each plot, curves represent, 
from top to bottom, ICCs for items with tf-.9, .7, .5, .3, and .1. (a) y=0. (b) y = .33. 
(c) Y = .67. (d) Y = l. 

ance of this ICC, Figure 3 shows plots of c as a function of A, for items of various difficulties 
and examinees of differing guessing propensities. Note that the probability of a correct answer 
to the item increases with increasing guessing propensity and increases too with decreasing 
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item difficulty (increasing S). Interestingly, not only docs this procedure produce ICCs tai- 
lored to the test format and the testing situation under consideration; it also gives rise to other 
functions relating ability (A,) to the probability of a wrong rc^nsc or the probability of leav- 
ing the item unanswered. These functions could have important theoretical implications for 
polychotomous response models. In the next section, we further illustrate the flexibility of 
finite state modelling to derive proper ICCs by considering items for which different varia- 
tions on Assumptions i-vi hold. 

Finite State Polynomk ICCs for other Situations 

While not aiming to produce an atlas of polynomic ICCs, we explore here the consequences 
of varying the assumptions that led to Equations 2a-2c in the previous section. Our main goal 
is to show how various assumptions r^resenting characteristics of the testing situation can be 
incorporated into this procedure to derive matching ICr^. In order to make clear what these 
effects are, we will modify each assumption in turn and produce corresponding ICCs for the 
new situations. However, we will skip Assumptions i and ii. Local iridependence across items 
is retained because, as noted earlier, it is required for collapsing data across all items in the 
test. Assumption ii concerning independence of options might be removed. As mentioned 
above, this assumption implies that correct classification of fewer than all of the distractors 
must not lead to correct classification of the answer when it is not known. Violation of this 
assumption could easily be handled by the model, but we will not consider this case here be- 
cause test items with this characteristic would be considered logically defective both by 
examinees and by score users. 

Assumption Hi: Number of Options per Item 

Let us assume that there are four rather than three options per item but that the other assump- 
tions listed above remain the same. This change has one consequence in the tree diagram, 
namely, that there are four rather than three links corresponding to options. The resulting tree 
diagram has 48 paths instead of the 19 paths of the diagram in Figure 2. We do not show it 
here due to its complexity', but, nevertheless, upon constructing it, it can easily be seen that 

c - / + + ^A^'pf + 3p^(\-pfyn + p(\-pf + pil-pfy + (l-/?)V/4, (3a) 

w = ViUpf y/2 + 2;7(l-;>)^ + 3(1-/)/y/4, (3b) 
u - 3/^(l-p)^(l-Y) + V(l-Ml-Y) + (1-Ml-Y). (3c) 

Figure 4 shows plots of Equation 3a fur the same values of y and £ as in the corresponding 
plots for Equation 2a in Figure 3. Comparing Figures 3 and 4, it can be readily seen that any 
curve for 4-option items is always below the corresponding curve for 3-option items. That is, 
other things being equal, increasing the number of options in the item has the effect of lower- 
ing the probability of an examinee's correctly responding to it. 
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(c) (d) 




Figure 4. Finite state polynomic ICC given by Equation 3a. In each plot, curves represent, 
from top to bottom, ICCs for items with ^ = .9, .7, .5, .3, and .1. (a) y=0. (b) Y-.33. 
(c)Y-.67. (d)Y = l. 

Assumption iv; Identifiobility of Distractors 

Items are sometimes found one of whose distractors is much more readily classifiable than the 
other options. Finite state modelling can easily accomodate items of this sort by assuming that 
if a single option is classified, then it is a distractor. In the tree diagram of Figure 2, this as- 
sumption means that the probability that a single classified option is the correct answer is 
zero, while the probability jf the correct answer being among k {\<k<n'\) classified options 
remains kin. When three-option items are considered (i.e., for n=3), only the first part of 
this statement applies (since there is no integer k such that \<k<2), and it results in branch- 
es with probability at the fourth link in Figure 2 being changed to 0 and branches wi± 
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probability also at the fouith link, being changed to 1. After these modifications, including 
the deletion of paths one of whose links has been assigned a probability of zero, we get 

c = p3 + 3fyl'p) + 3pil-pfyn -H (1-/^)^/3, (4a) 
w = 3p(\-pfyn + 2(l.p)^/3, (4b) 
u - 3p(l-Ml-Y) + (1-/7)'(1-Y). (4c) 

Figure 5 shows plots of Equation 4a for the same values of y and 6 as above. There are a 
number of different assumptions regarding the identifiability of distractors that can be used in 
place of either of the two we have considered here. An example of another can be found in 
Garcfa-Pdrez (1990). 





Figure 5. Finite state polynomic ICC given by Equation 4a. In each plot, curves represent, 
from top to bottom, ICCs for items with ^=.9, .7, .5, .3, and .1. (a) y=0. (b) y = -33. 
(c)Y-.67. (d)Y-l. 
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Assumption v: Response Behavior 

Now we will assume that the examinees take the test under directions to answer every item 
regardless of knowledge. This practice will serve to eliminate a construct-indevant contami- 
nant, namely, y. If the examinees comply with these directions, all of them will behave as if 
Y = 1 regardless of anyone's particular willingness to guess. Under these circumstances, the 
tree diagram is a simplified version of that in Figure 2 with y = \ everywhere. There, only 
two possible response outcomes remain, whose associated probabilities can be shown to b& 



Equations Sa and 5b are the same as Equations 2a and 2b when y=l. (Also, y = 1 makes 
u=0 in Equation 2c.) Therefore, the ICCs described by Equation Sa for selected values of S 
are the same as those arising from Equation 2a for examinees with y-l (see Figure 3d). 
Note that the ICC represented in Equation Sa is the only one throughout this paper that ap- 
plies to a dichotomous response model. As another example of how different response behav- 
iors can be modelled using finite state methods, response behavior appropriate for formula 
scoring has been considered in Garcfa-P6rez and Frary (1989). 

Assumption vi: Format of Administration 

When the administration format varies, the main structure of the tree diagrams remains basi- 
cally the same, since it represents knowledge and bdiavior that are largely independent of the 
administration format. The only difference is in the assignment of paths to the response cate- 
gories that are possible under the particular format under consideration. We will illustrate this 
point by considering a test administered under answer-until-correct (AUC) directions. In this 
case, examinees continue selecting options until tb' correct answer is chosen (see Hanna, 
1975). Figure 6 shows the tree diagram for this situation. It differs from the diagram in Fig- 
ure 2 in that the guessing link has been omitted, since examinees behave as if y^l. Also, 
paths formerly leading to correct responses continue to do so as correct responses at the first 
attempt (C|). Some of those formerly leading to wrong responses now result in correct 
responses at the second attempt (C^), and some others result in correct responses at the third 
attempt (Q). Finally, all formerly unanswered items result now in Cj, or Cj (a correct 
response on the first, second or third attempt). Therefore, from Figure 6, 



in which Cj, C2, and C3 denote the probabilities of the response outcomes designated by the 
corresponding uppercase letters. Note that Equation 6a is the same as Equation 5a. This is 
because examinees make their first attempt under AUC directions under the same circum- 




(5a) 
(5b) 



c,=iP + 3p^(l-p) + 2p{Upf + {l-pf/3, 
C2 = pd-pf + (l-p)V3, 
C3 = (1-/?)V3, 



(6a) 
(6b) 
(6c) 
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stances as under conventional directions, with answer-eveiy-item behavior, regardless of the 
number of options per item. The curves in Figure 3d apply in this case too. However, Equa- 
tions 6b and 6c are also relevant, providing relationships between ability and probability of a 
correct response on the second and third attempts. Thus, AUC directions give rise to 
polychotomous response models, while answer-every-item behavior under ccmventional ad- 
ministration of the test results in a dichotomous re^xMise model. This fact has a substantial 
bearing on parameter estimation; the outcome of a second attempt at answering a three-option 
item provides information beyond that which is available when the test is administered under 
answer-every-item directions. And, clearly, the number of additional sources of information 
that can be used in parameter ^timation increases with the number of options per item when 
the AUC format of administration is considered instead of the conventional one. 



Link # 
3 4 



Outcome 



q : p\\-p) 
q : p\\-p) 

/K1-P)V3 

pH\-p) 

P{\-P?n 
Pil-P)y^ 
Pi^'Pfn 

P(1-P)V3 
;K1-/')V3 

(I-P)V3 
(1-/7)V3 
(I-P)V3 

Figure 6. Tree diagram for the same situation as in Figure 2, except t^t the test is now 
responded to under AUC directions. The only differences between this diagram and that in 
Figure 2 are that the guessing link has been removed here since y^^l and that responding 
with total ignorance may result in one among three response outcomes. Note, however, 
that the response outcomes are different from those in Figure 2. 
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Comparison with Conventional ICCs 

As shown in the foregoing development, finite state polynomic ICCs follow naturally from a 
mathematical description of objective test performance that starts from a paxameterization 
different from that implicit in logistic ICCs. As a consequence, the number of parameters and 
their meanings differ substantially in logistic ICCs in comparison with those arising from 
finite state modelUng. Although the two types of ICC should evoitually be compared empiri- 
cally, this section is devoted to a theoretical analysis of the differences between the item and 
examinee parameters of each type of ICC. 

Polynomic S versus Logistic b and a 

The difficulty parameter in logistic ICCs is the point on the ability scale at which an exam- 
inee has a probability of answering the item correctly Uiat is hadf-way between the lower 
asymptote and 1. It is clear, however, that S does not have this interpretation, as repealed by 
inspection of the plots in Figures 3-5. Further, variation in the logistic difficulty parameter 
meruly produces a horizontal displacement of the ICCs along the ability scale, while variation 
of 6 in finite state polynomic ICCs also varies the steepness of the curves (see Figures 3-5). 
Therefore, S in finite state polynomic ICCs accomplishes the same effects as both ^ and c 
together in logistic ICCs. 

This characteristic of £ can best be appreciated if one considers the different meanings assign- 
ed to item difficulty and discrimination parameters in classical t^t theory as opposed to IRT. 
In classical theory, item difficulty is defined as the ratio of the number of examinees who 
answer the item correctly to the number of examinees who attempt it, and item discrimination 
is related to the number of distinctions that can be made among examinees based on responses 
to the item. Obviously, these two parameters are not independent, and there is a well-known 
inverted-U-shaped relationship between them: the number of distinctions that can be made in- 
creases as item difficulty approaches a medium value, and it rapidly decreases as difficulty 
approaches either of its extreme values. Although very easy or very difficult items have low 
discriminating power from this point of view, it is also true that a fairly easy (alternatively, 
difficult) item, which will not serve to distinguish among examinees of high (alternatively, 
low) ability, can nonetheless be useful to distinguish among examinees of low (alternatively, 
high) ability. Tlie classical item discrimination parameter is deficient in this sense, since it 
fails to capture the fact that items may have the same discrimination c^iability, but at differ- 
ent ability levels. Conventional IRT attempted to remedy this situation by adopting different 
definitions for difficulty and discrimination that made these parameters independent of each 
other. The difficulty of an item was redefined as the ability needed to have a 50% chance of 
answering that item correctly, and item discriminating power was redefined to reflect the ac- 
curacy with which two examinees can be distinguished when they have abilities slightly above 
and below the ability designated as item difficulty. These two parameters are known to be 
related to the point of inflection and to the slope at this point on a conventional ICC, and 
either of these two values can be varied independently of the other. 
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Unlike the way either chrsical theory or UCT handles difficulty and discrimination, finite state 
modelling implicitly assumes that the ability of an item to distinguish among examinees (i.e., 
its discriminating power) is a consequence of the interaction between the difficulty of the item 
and the ability of the examinees who respond to it. As can be seen i:. Figures 3-5, the inflec- 
tion points of finite state polynomic ICCs move to higher positions on the ability scale as item 
difficulty increases {S decreases). Thus, for these ICCs, items discriminate (in an IRT sense) 
at ability levels that are directly related to tl»ir difficulty. At the same time, the slopes of the 
inflection points are such that the inverted-U-shaped relationship between the classical diffi- 
culty and discrimination indices will hold. 

Polynomic y versus Logistic c 

As Thissen and Steinberg (1986, Equation 5) show, the third parameter in logistic ICCs re- 
sults from assuming that the probability of guessing correctly is a fraction of the probability 
of really not knowing the answer. This fraction is nominally regarded as an item characteris- 
tic, with, for very low ability examinees, a maximum equal to the inverse of the number of 
options. However, lower values are often assigned to obtain better fits of the data. The need 
for this lowering in an item parameter has been assumed to reflect an examinee characteristic, 
namely, being misinformed or gullible with respect to certain distractors. Departures from the 
inverse of the number of options have also been attributed to differences in guessing propen- 
sity on the part of the examinees by Mislevy and Bock (1982, pp. 727-728) who wrote that, 
owing to these differences, "the Bimbaum three-parameter model for dichotomous items, 
which posits for each item a guessing probability that is constant over all examinees, will be 
in error." As they point out immediately afterwards, application of that model tends to over- 
reward frequent guessers and under-reward examinees who tend to refirain from guessing. As 
far as a comparison with the parameterization underlying finite state polynomic ICCs is con- 
cerned, the important point is that the third parameter of logistic ICCs is forced to contain 
Iwth item and examinee components, despite being regarded nominally as only an item para- 
meter. 

Finite state modelling treats these two influences separately. Willingness to guess is incorpor- 
ated as a second examinee parameter, y, which converts the polynomic ICC into an item 
characteristic sufface. The polynomic ICC lower asymptotes are then determined for every 
particular y, with the lower asymptote reaching its minimum at 0 when y=0 and reaching its 
maximum at the inverse of the number of options (see Figures 3-5) when y = 1. Actually, it 
makes little sense to speak about lower asymptotes at dimensional cuts of two-dimensional 
functions, although any cross-sectional profile of the item characteristic surface at a selected y 
will render a true ICC, and it is helpful to keq) this in mind. 

Because finite state polynomic ICCs have y as a second examinee parameter, the effects of 
variations in guessing propensity on the part of the examinees can be removed from ability 
and item parameter estimates. However, in two of the examples above, the contribution of the 
second examinee parameter to the item characteristic surface was removed by assuming com- 
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pUancc with instructions to answer every item. Accordingly, the item ctiaiacteristic surfaces 
were reduced to curves, and, at the same time, the possible effects of differing guessing pro- 
pensities were eliminated. 

Pofynomic X versus Logistic B 

The finite state parameter X represents ability in a metric that is directly interprctable in a 
psychological sense, namely, as the amount of knowledge or ability that the examinee has. 
More specifically, the use of A. is consistent with the remarks of Glaser (1981, p. 93S) sup- 
porting the use of critCTion-referenced toting and expressing "ccMicem for making test scores 
informative about behavior rather than about relative performance on poorly specified dimen- 
sions" (italics added). Indeed, the suitability of A, for use in criterion-referenced testing has 
been addressed in Garcfa-P^rez (1989b), where the number of items required for arriving at 
a mastery decision was determined as a function of examinee response strategy, number of 
options per item, and test administration format. 

The importance of these features of X (and of finite state polynomic ICCs) can be rcaMzed by 
considering the extent to which 6 can be inteipreted as a ratio or even interval measure. Lord 
and Novick (1968, p. 369) point out that "whenever any single item characteristic curve is a 
monctonic increasing function of 6, it is always possible to transform 8 monotonically so that 
the characteristic curve becomes a normal ogive." The transformed 6 is then uninterpretable 
in any direct psychological sense. Moreover, it is important to r»lize that this transformation 
is implicitly and unavoidably made whenever parameters are estimated by fitdng data to the 
normal ogive (or the logistic function). The 6s that then result from transformations 
constrained only to be monotonic can only be interpreted with assurance in an ordinal sense. 

Additional Ber eflts of Using Finite State Polynomic ICCs 

As shown in the two precedinf, sections, finite state polynomic ICCs incorporate characteris- 
tics of the examinees, the items, and the format of administration of the test in a realistic 
manner. Also, finite state polynomic models give rise to a measure of ability that is directly 
interprctable psychologically. But psychological realism and inteipretability of X are not the 
only practical advantages that can be gained ftom using finite state polynomic ICCs in place 
of logistic ones. It is these additional advantages that we consider in this section. 

Applicability of IRT Methods to Any Format of Administration of the Test 

The mathematical expressions derived using finite state theory are tailored to match any poss- 
ible specification of the factors represented in Assumptions ii-vi above as they apply to a 
given test, thus allowing IRT methods to be used with tests administered under any format. 
One advantage of this fact can be illustrated in the context of the concern with parallel tests 
on the part of classical test theorists, and the solution piuvidcd by IRT to cope with nonparal- 
lel test forms. Suppose the same test were administen^ under two different formats to the 
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same examinees. Then the tv/o administrations would be strictly parallel (disregarding learn- 
ing during the test)» but two different score distributions would result. The discrepancies 
between them would result only from the differences in the format of administration, since the 
same examinees and items were involved in both cases. Obviously, these score distributions 
do not provide direct information about the abilities of the examinees in the group. This is be- 
cause each format of administration of the test (potentially) gives rise to different and 
noncomparable response outcomes that are differently related to ability. Under such circum- 
stances, being able to recover the abilities from either of these two score distributions depends 
on the avaUability of an appropriate theoretical framework that conveniently accounts for the 
differences between the administration formats and that prescribes procedures to estimating 
those abilities from eitiier of tiie score distributions. 

Conventional IRT would apply the same type of ICCs in both cases, which would require 
changes in dther tiie examinee or tiie item parameters, despite the faa that the same 
examinees and items are involved in both cases. Unlike this approach, finite state theory of- 
fers the needed theoretical framework and supplies die (diffemit) ICCs tiiat should be used in 
each case to arrive at die same characterization of examinees and items in terms of their pa- 
rameters. That scoring metiiods derived from finite state theory are capable of accomplishing 
this goal has been confirmed in a dual administration of a test to the same examinees under 
both conventional and Coombs-type directions (Garcfa-P6rez, 1987). 

One other advantage of finite state theory as a tool for deriving ICCs is tiiat it yields equa- 
tions for polychotomous response models as readily as for dichotomous models. This charac- 
teristic greaUy facilitates the application of IRT to new and varied test administration formats. 
It also provides a new methodology for the study of tiie interaction between test format and 
examinee behaviors witii die goal of increased accuracy in tiie estimation of ability. 

Simplified Parameter Estimation 

A thorough discussion of parameter estimation for finite state polynomic IRT models would 
require a separate and lengthy paper. Therefore, in what follows, only major points character- 
izing tiiese models are presented. 

The first of tiiese is that the metric of X provides unambiguous ability estimates in the case of 
perfect ?md zero scores. Unlike what happens witii tiie unbounded 8 in logistic ICCs, tiiese 
scores will result directiy in A. = 1 and A.=0, respectively. 

The adaptation of conventional IRT algoritiims to tiie estimation of item and examinee para- 
meters in finite state polynomic models has not as yet been addressed. Nevertiieless, tiiis work 
should not be difficult to carry out, as only a replacement of tiie matiicmatical expression to 
represent tiie ICC is involved. Moreover, Riefer and Batchelder (1988) have shown how easy 
it should be to obtain maximum likelihood point estimates and confidence intervals for tiie 
parameters of any finite state model. 
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These concerns aside, very simple methods for the estimation of X are already available that 
have proven to yield accurate estimates (see Garcfa-P^rez, 1989a; Garc(a-P6rcz and Frary, 
1989). Taking the set of expressions for the probability of every re^nse outcome that arises 
in a given situation as a system of nonlii^ar equations, th^ n^ods merely involve solving 
for X once every probability has been replaced with the empirical prcqxmion of items an- 
swered by the examinee in the corre^nding response category. In practice, this amounts to 
finding the single root in the interval [0,1] of what Garcfa-Pdrez and Frary (1989) called a 
"scoring polynomial" that is derived from the set of equations under consideration. 

Regardless of the approach that is adopted for the estimation of parameters, it is also clear 
that those procedures will have to be adapted to every particular ICC that finite state theory 
produces, with special (^sideration of dichotomous versus polychotomous models. As was 
pointed out in the last paragr^h of the previous section, the properties of the estimates ob- 
tained in each case could be taken as a basis for deciding on the optimal administration format 
for the maximization of accuracy in parameter estimation. 

Avoidance oflCCs that Cross 

An important side effect of the way polynomic ICCs handle difficulty and discrimination is 
that any two with the same c-intercept will not cross. This may be verified by noting that p 
in Equation 1 is an increasing function of « and that all of those ICCs are increasing func- 
tions of p. Hence finite state polynomic ICCs increase monotonically with increasing £ (i.e., 
the probability of success decreases monotonically with increasing item difficulty). In the 
case of logistic ICCs, it has been shown analytically by Sijtsma (1988, p. 64) that any two 
with differing discriminating power must cross. This crossing often occurs at extreme values 
of e, but, even if the crossing occurs at a 6 within, say, [-2,2], it is often the practice to 
adopt such two- or three-parameter logistic ICCs when they fit the data better than one-pa- 
rameter logistic ICCs. 

In many cases, however, the ubiquitous (so-called "empirical") ICCs that cross may be only 
the result of applying very powerful curve-fitting techniques to obtain two- or three-parameter 
logistic functions with differing values of a, which, therefore, must cross. In other words, it 
is the decision to fit the data to a mathematical function permitting the curves to cross that 
makes estimated ICCs actually cross, sometimes at a 6 within [-2,2], To see how this might 
happen, suppose that the polynomic ICCs in Figure 3a hold and that responses to the items 
with Ss of .5 and .9 are collected. From the shape of the true ICCs, it is clear that fitting 
these data to logistic curves will yield much poorer results for the one-parameter function than 
for the two-parameter function. TWs is because the true ICCs differ somewhat in slope, which 
will in turn allow a two-parameter function-fitting algorithm that capitalizes on chance to 
yield different values of a for each item. As a result, their estimated two-parameter ICCs 
must cross. Hence, artifactually, these items would be considered as evidence that "empiri- 
cal" ICCs do cross, but in a situation in which the true ICCs do not cross. 
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Apart from this concern, the question of whether to model data with functions that are or are 
not allowed to cross is theoretical and by no means empirical in nature. We believe that thei« 
are strong reasons to prefer as ICCs fimctions that do not cross. As Wright (1977, p. 103) 
pointed out in support of the Rasch model, "...we want to think that the probability of suc- 
cess on the harder of two items should always be less than the probability of success on the 
easier, no matter who attempts the items." This property is ensured only when ICCs do not 
cross. At the same time, only an item difficulty parameter arising from a frairoworlc that 
yields noncrossing ICCs is wdl suited to conveying quantitative information about item loca- 
tion in a body of knowledge whose structure can be described by a quasi order, such as that 
considered by Falmagne and Doignon (1988). 

Proper Treatmm of Omissions, Guessing, and Partial Knowledge 

Conventional ICCs only provide an expresaon for the probability of getting an item right, 
which tends to deemphasize the ^t that nonright respcHises can occur in at least two response 
categories: wrong responses and omissions. As a result, in practice, the treatment of omis- 
sions in conventional IRT is limited to categorizing them as either wrong responses or as 
partially correct responses valued at the inverse of the number of options. Situations exist for 
which neither treatment would be apprtqmate. For example, many standardized tests of edu- 
cational achievement are administered under instructions which indicate that examinees should 
guess when uncertain regardless of their perceived knowledge. Yet numerous omissions oc- 
cur, presumably due to lack of examinee motivaticn or failure to attend to the instructions. To 
assume that all (or almost all) such omissions reflect total ignorance is certainly questionable. 
Yet this assumption is required to value omissions at the inverse of the number of options. On 
the other hand, treatment of omissions as wrong responses would penalize examines refrain- 
ing from guessing. Although Lord (1983) pn^)osed a model incorporating a true guessing 
parameter to account for omissions, available computer programs such as LOGIST 
(VVingersky, Barton, & Lord, 1982) or PC-BILCXj (Mislevy & Bock, 1986) still limit the 
treatment of omissions to the two choices mentioned above. 

Unlike this approach, finite state theory provides an expression for the probability of omitting 
at the same time that it gives expressions for the probability of getting an item right or 
wrong, save in cases where omissions do not occur. Hence, finite state theory provides for a 
proper treatment of omissions under a polychotomous response model. The reason that this is 
so is that finite state theory models the response behavior ^ropriately, establishing the con- 
tribution of each knowledge state to the probability of each possible response outcome. This 
can easily be seen by inspection of the right-hand sides of Equations 2-4, where omissions are 
shown to be the result of failures to guess in cases of total ignorance and partial knowledge. 
Also, correct and wrong responses resulting from guessing (with various degrees of partial 
knowledge) occur with the prdjabilities given by the addwids in which y is a fector, and cor- 
rect responses resulting from total or partial knowledge^ have the probabilities given by the 
reraaimng addends. Thus, finite state theory allows a distinction to be made between two 
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(liferent events and their associated probabilities: knowing what the correct answer to an item 
is, and getting a correct response (by either knowledge or guessing). 

Although the distinction between these events and the different cases of partial knowledge 
underlying them is not usually ccmsi(texed in tlie use of conventional ICCs, it should be noted 
that Waller (1989) suggested that the three-parameter logi^ ICC can be decomposed into 
two components which, in turn, can be intnpreted as carrying informaticm about the probabil- 
ity of a conect re^xMise based cm knowledge or as a result of guessing. Under this interpre- 
tation, the probability of a correct re^ionse with assured knowledge will be provided by a 
nwT-parametcr logistic ICC, while the probability of a correct response from guessing with 
partial knowledge may be obtained as the diffomice between this two-parameter ICC and a 
three-parameter ICC. Finite state polynomic ICCs, instead, model this decomposition explic- 
itly, as specified by the additive terms representing these cases in the functions themselves. 

Provision for Independent Tests of Fit 

In the introduction, we referred to the suggestion that an ICC should be considered a basic as- 
sumption to be tested through goodness-of-fit studies. In this context, it is worth noting that 
the finite state theory approach to deriving ICCs provides the means for tests of fit in a way 
that is basically different from the conventional curve-fitting strat^y used with logistic ICCs. 

Testing the fit of a conventional ICC to data as indicated by a convenient goodness-of-fit sta- 
tistic raises a fundamental contradiction, since the goodness of the fit is measured qfter para- 
meters have been estimated under the assumption that the model is actually correct. One may 
wonder to what extent these artifactual estimates force the fit, since it is well known that a 
good fit can be found even when the source model has nothing to do with the fitted model 
(see Wood, 1978). Put another way, conventional ICCs cannot really be t^ted against data, 
but only fitted to them, since there is no possibility of testing the adequacy of a logistic ICC 
independently of estimating model parameters. The main reason that this is so is that there are 
no derivable predictions from logistic ICCs against which empirical data can be contrasted by 
measuring their agreement with the theoretical expectations. 

In contrast, finite state polynomic ICCs can be t^ted indq)endently. This is true because 
finite state theory provides expressions for the probability of every possible response outcome 
to an item. It is this fact that results in testable predictions regarding the relationships among 
the proportions of responses fallmg into each response category, predictions that can be tested 
without recourse to estimating model parameters. Thus, finite state polynomic ICCs have an 
advantage over ccmventional ones with r^pect to Marascuilo*s (1988) complaint that models 
are more often fitted to data than tested against data, since they lead to goodness-of-fit studies 
and parameter estimation algorithms that are independent of each other. 

Further, a major and hard-to-avoid pitfall in the application of conventional IRT to practical 
problems is the need to discard items that do not fit the assumed model. The fit of logistic 
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ICCs to a range of empirical data is accomplished by varying the parameters of a single func- 
tional expression. In assessing model-data fit, the underlying question is: can reasraable para- 
meters be found such that most of the items pass a goodness-of-ftt test? To answer this ques- 
tion, very powerful curve-fitting algoritiims are applied in a huge parametn- spscc. Under 
these circumstances it is not surprising that relatively few items happen to &il tte goodness- 
of-fit test. Nevertheless, even ti!ough the goodness-of-fit test nominally pertains to the ade- 
quacy of the ICC as an assumption in its own right, it is common practice to discard the item 
rather Uian tiie ICC when the fit is poor. This practice risks producing a test in which items 
selected on tiie basis of a statistic^ criterion may be educationally or psychologically inappro- 
priate (see Goldstein, 1979). This potential outcome occurs as a result of using tiie same 
matiiematicai expression for the ICC of every item in tiie test, a practice which, in turn, re- 
sults from being able to employ only a voy restricted set of panuneters in tailoring conven- 
tional ICCs to account for differences among items. Indeed, it simply may not be possible to 
account for tiiese differences in terms of tiio«s parameters. 

Finite state polynomic ICCs are not limited in tiiis way. Finite state theory accomodates em- 
ploying a variety of (valid) assumptions tiiat could permit keeping educationally or 
psychologically relevant items in a test by prcq)erly accounting for tiieir peculiarities. Appli- 
cation of IRT only requires that each item be described by an ICC, with no need for all of 
them to have tiie same matiiematicai fonn. Altiiough tiiis variation witiiin a test may compli- 
cate tiie estimation procedures, adoption of finite state tiieory will serve tiie more critical goal 
of providing a technique for handling tiie items that have been chosen to assess the desired 
educational objectives. By supplying a framework witiiin whfch tiie concept of poor-fitting 
items can be replaced by tiie more plausible one of inaiqnt^mate ICCs, adoption of finite 
state metiiods can provide test practitioners witii a tool for accomplishing Goldstein's (1979, 
p. 220) recommended shift of emphasis "towards a development of quantitative assessment 
techniques which are firmly rooted in qualitative educational objectives." 

Discussion 

Behind tiie surface aspects of any psychometric model lies the underlying philosophy of its 
proponents about how models should be constructed and what should be demanded of tiiem. 
As for ourselves, what we seek in a model of performance on objective tests is tiiat it be psy- 
chologically realistic and directiy account for tiie processes involved in lesponding to an item 
(as opposed to tiiose involved in arriving at tiie response itself). This position motivates tiie 
following discussion of the theoretical foundations of finite state polynomic versus logistic 
ICCs in tiie context of recentiy expressed concern about tiie psychological realism of matiie- 
maticai models and about tiie explanatory role of matiiematics in psychology. 

Perhaps the most appealing feature of polynomic ICCs as compared witii tiieir conventional 
counterparts is tiiat tiie former are derived from an operational definition of knowledge level 
and a set of realistic assumptions about how test items are constructed and how examinees be- 
have in responding to tiiem, whereas tiie latter lack tiiis underpinning. At tiie core of tiiis 
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difference is the distinction made by Coombs (19S3, p. 15) betwem the descriptive and ex- 
planatory role of mathematics in psychol(%y: 'The descriptive use of mathematics does not 
seek to explain an empirical generalization by deducing it from basic (axiomatic) pnq)erties of 
the empirical system. In its explanatory role, mathematics can be used to show that an empiri- 
cal generalization must nec^sarily hold." Further advantages of finite state modelling of 
psychological processes have been discussed in Riefer and Batchelder (1988). 

This ti^eory-based approach to developing ICCs is consistent with the recommendations of 
several authors who have discussed tte shortcomings of other sqqjroaches. It is clearly in line 
with Molenaar*s (1981, p. 228) reqi^st that the role of psychology be dominant over mathe- 
matics and statistics in the development of models for achievement testing. Similarly, it con- 
forms with the prefermce expressed by several authors (e.g., Loftus, 1985; FnMdman, 1985, 
1987; Marascuilo, 1988) for mathematical models ccmnected to a theoretical framework, 
rather than simply consisting of a set of equations that data aze conveniently found to fit. 
Indeed, there is more to constructing a psycholc^cal model than just getting a good fit. 

Extensions to the Finite State Approach 

The finite state approach we have dealt with in this paper is a general framework capable of 
some further improvements in the direction of increased psychological realism or comprehen- 
siveness, three of which we will now outline briefly. 

First, as is the usual practice with conventional ICCs, we have not considered the possibility 
that examinees are misinformed. However, an assumption to this effect would be easy to 
incorporate realistically into the finite state framework. Toward this end, the options in the 
item pool would have to be divided into three sets: those whose truth value the examinee 
knows, those whose truth value the examinee ignores, and those about which the examinee is 
misinformed. This leads to considering a third examinee parameter, m. to represent the proba- 
bility of being misinformed about a given option. In this case, the constraints on these param- 
eters are Os^Sl and OsX^l-/i. As a result, a third branch would arise from the nodes repre- 
senting each option in the tree diagrams, the branches now having base probabilities of m. ^« 
and This additional parameter will result in somewhat more complicated parameter- 

estimation procedures. Nonethel^, as is the case wiUi the fourth parameter in four-parameter 
logistic functions, n may often have such a small value that its use in finite state polynomic 
ICCs may not be worth the trouble. 

Second, we have thus far assumed that 6 ^lies to an item and, hence, to each of its options. 
An alternative view, possibly resulting in increased realism, would be to regard each option 
as having its own distinct difficulty level. Then each op&on within an item might have a dif- 
ferent value of £, which in turn would result in different ps for each option. This extension of 
the model would have two different, but related, theoretical implications: it would allow the 
model to account realistically for the fact that some options are more easily recognized as di- 
stracton and, hence, less frequently chosen than others, and it would allow item option char- 
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acteristic curves to be derived for every distractor in an item administeied under conventional 
response directions (as opposed to allowing only the derivation of curves reflecting the proba- 
bilities of answering correctly, answering incorrectly or omitting). Thus, as is tlie case in 
conventional IRT with the work of Bock (1972) and others, finite state theory is capable of 
producing IRT models for the nominal rei^nse case. 

Third, the finite state af^roach can be extended in the context of speeded tests by the straight- 
forward addition of a speed parameter. An illustration of how dus could be accomplished can 
be found in Link (19S2), who used a finite state approach to deriving methods for analysing 
refuse tim^ to correct and wrong responses in experiments involving yes-no qu^tions. 
This framework, either with or without the additiim of difficulty parameters, can be directly 
applied to true-fialse tests, and it can be further combined widi finite state theory to yield 
models in which both the type of the response and the time it takes the examinee to give it 
are considered. 

A Research Agenda for the Development of Finite State Pofynomic IRT models 

The main goal of this paper was to present a new kind of ICCs that arise as an extension of 
finite state theory, a methodology that has already provoi useful in modelling performance in 
objective tests. To make finite state polynomic ICCs usable in practice, the first tmng to do 
is to make procedures available for the estimation of model parameters from responses to 
objective tests meeting any particular set of conditions. Although, as noted above, this prob- 
lem has largely been solved with respect to A. (Garcfa-P6rez, 1987, Garcfa-P^rez and Frary, 
1989, 1991), the estimation of « stiU has to be addressed. 

There are a number of statistics available for measuring the goodness of the fit of a data set to 
a multinomial model. From the (^ventional approach to assessing model-data fit in IRT, it 
has been shown that some of them are more adequate than others (see McKinley & Mills, 
1985). As finite state polynomic ICCs allow addressing the issue of model-data fit differently, 
it remains to be seen which statistics are better for that purpose. Another important line of 
work will have to do with the empirical comparison of finite state polynomic versus logistic 
ICCs, not only as to their capability of fitting data but also, and more important, with respect 
to the predictive validity of the resulting scores. 

In addition, and putting together parts of what was mentioned above, use of finite state poly- 
nomic ICCs would require the development of a complete model for any situation at hand. 
Development of a spectrum of models for differing objective testing situations (e.g., with 
varying numbers of options per item and administered under various formats) would allow a 
comparison among them to be made with an eye toward maximizing the amount of informa- 
tion about the examinees that is obtained when a set of items is administered. 
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